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Abstract 

At an early stage in pre-biotic evolution, groups of replicat- 
ing molecules must coordinate their reproduction to form ag- 
gregated units of selection. Mechanisms that enable this to 
occur are currently not well understood. In this paper we 
introduce a deterministic model of primitive replicating ag- 
gregates, proto-organisms, that host populations of replicat- 
ing information carrying molecules. Some of the molecules 
promote the reproduction of the proto-organism at the cost of 
their individual replication rate. A situation resembling that 
of group selection arises. We derive and analytically solve 
a partial differential equation that describes the system. We 
find that the relative prevalence of fast and slow replicators 
is determined by the relative strength of selection at the ag- 
gregate level to the selection strength at the molecular level. 
The analysis is concluded by a preliminary treatment of finite 
population size effects. 

Introduction 

In primitive organisms without central control of genome 
replication, a conflict between selfishly reproducing genes 
and genes useful for the replication of the whole organism 
may occur This raises the question of how, and when, the 
organism as a whole can be viewed as a unit of selection. 
This is a necessary condition for such systems to evolve into 
contemporary organisms, with a well-defined separation be- 
tween the genotype and phenotype, and a coordinated repli- 
cation. 

We study the evolutionary dynamics of systems consisting 
of self-assembling container aggregates that contain pop- 
ulations of self-replicating information carrying molecules 
- proto-genes. The aggregates can be viewed as primitive 
proto-organisms, each with a genome consisting of an evolv- 
ing population of proto-genes. The aggregates grow by suc- 
cessively incorporating new building blocks. Eventually ty- 
hey become unstable and spontaneously divide, whereby a 
replication of the proto-organism has occurred. The pro- 
duction of new building blocks, e.g. amphiphilic polymers, 
is catalyzed by the proto-genes, e.g. through an electron 
charge transfer process. A strain's ability to self-replicate 
and its chemical properties critical to the growth of the ag- 
gregate are assumed to be uncorrelated. Certain strains of 



proto-genes are efficient as self-replicators, whereas other 
strains are more active in the production of new building 
blocks, and thereby contribute more to the reproduction of 
the container. The evolution of the system as a whole is then 
characterized by a conflict reminiscent of group selection. 

What conditions enable co-existence of selfish genomes 
and locally suppressed genomes whose presence are advan- 
tageous to the population they are members of, or, in broader 
terms; what conditions allow a trade-off between local re- 
production of individual sequences and global reproduction 
of the proto-containers that enclose them? 

Background 
The quasi-species model 

As the quasi-species framework serves as a basis for the cur- 
rent container growth model, the former will now briefly be 
introduced. The quasi-species model was originally formu- 
lated by Eigen ( |Eigen, 1971^ as a way to describe and anal- 
yse pre-biotic molecular replicator dynamics. Constituted 
by bit-strings of finite length, the individuals represent se- 
quences of elementary building blocks or bases that are con- 
sidered to have given characteristic traits that determine their 
expected number of offspring per time unit. As a simple yet 
illuminating case, a single peak fitness landscape is often as- 
sumed. That is, all individuals are assigned an equal ability 
to reproduce, except for one — the master sequence — which 
is given a higher fitness. In contrast to the selective pres- 
sure, variation is implied by a limited accuracy in the asexual 
copying process from parent to offspring (i.e. mutations). 

Let Xk denote the relative frequency of individual k. The 
replicator dynamics of the population is then described by 
the rate equations 

JCi = X! 21'^'-*^' ~exk, (1) 

where o/ is the fitness of individual I, 2[, is the probability 
that reproduction of individual / gives individual k as off- 
spring, and where e = x = J^i ciixi is the average fitness 
of the population. The second term, exk, ensures normalised 
concentrations. 



Given large sequence lengths v and a low mutation rate 
/J, a useful approximation is possible by acknowledging that 
there is a low probability of mutating from a background 
sequence (that is, any sequence not being a master sequence) 
onto a master sequence (Now ak and Schuster, 1989) . When 
employing this approximation, the population is considered 
to consist of two types, master- and background sequences, 
and the time dynamics is reduced to 

^ ^ (A-l)^(^^-^), (2) 

where ^ is the concentration of master sequences, A — aois 
the fitness advantage of the master sequence, and the copy- 
ing fidelity Q is the probability that there are no mutations 
during a replication event. The background sequences all 
have fitness 1, so the factor A — 1 is the relative fitness ad- 
vantage of the master sequence. 

The most important result from this model concerns the 
existence of a sharp lower limit to the copying fidelity — the 
error threshold — ^below which no information can be pre- 
served in the population by means of the selective pressure 
( pigen, 197lt . 

Eigen's paradox 

An implication of the error threshold is that large early 
molecular replicators — with lengths of, say, RNA viruses of 
today — had to reproduce with very high accuracy. Due to 
this requirement, specialised enzymes had to be utilised in 
order to correct for imposed mutations. However, these en- 
zymes, in turn, could only be encoded by long nucleotide 
sequences. That is, large sequences required enzymes that 
required large sequences. In order to resolve this recur- 
sive problem, Eigen proposed the hyper-cycle; a mecha- 
nism with which a set of sequences cooperatively by means 
of auto-catalysis share the burden of information carriage 
( |Eigen and Schuster, 1977^ . These constructions, though, 
are presumed to be highly vulnerable to parasites — i.e. 
molecules that benefit from catalytic support, although not 
contributing to the auto-catalytic circle — and does therefore 
not serve as a solution to the information storage dilemma 
in a harsh pre-biotic environment. 

Group selection 

An alternative architectural principle that in a more sta- 
ble manner would allow for cooperation among information 
carriers — organised in hyper-cycles or not — is to form com- 
partments. When realised, a compartment and the template 
molecules that it encloses may under evolution be viewed as 
a unit of selection whose absolute fitness is determined by 
the composition of its contents. 

The situation described constitutes group selection 
as originally studied by Wright in his Island model 
of spatially isolated local and macroscopic populations 
( |Wright, 1931) . In a similar setting, dLevins, 1970^ , 



and later ( |Boorman and Levitt, 19 801 study extinction 
and re-colonisation of- and by locally evolving popula- 
tions. At a smaller scale, jSzathmary and Demeter, 1987| 
[Smith and Szathmar y, 1995t analyse group selection on the 
level of replicative molecules. In their stochastic correc- 
tor model, small compartments encapsulate replicators of 
two given types, fast and slow, where the latter benefit the 
survival of the group and the former does not. Given that 
the groups consist of few molecules, thus implying a high 
degree of stochasticity in the system, it is shown numer- 
ically that, under certain conditions, there exists a stable 
global polymorphism of fast and slow replicators. Similarly, 
( |Alves et al., 200 it adapts Wright's island model to the do- 
main of molecular replicators. Again, two template types, 
fast and slow, enclosed in finite — although not necessarily 
small — compartments are considered. In consistency with 
previously mentioned work, parameter regions that enable 
stable coexistence of the two types are found. In more recent 
work ( Fontanari et al ., 2005t , the above model is generalised 
to concern up to four (as limited by numerical constraints) 
different template types, where vesicles containing popula- 
tions with high degrees of multitude are favoured. The dy- 
namics is evaluated by numerically iterating a set of recur- 
sion equations, whereof the regions of the model's parame- 
ter space that enable coexistence of up to the four template 
types are identified. 

The full population dynamics 

Consider a population of proto-containers, where each con- 
tainer hosts a population of individuals as formulated in the 
quasi-species framework. Since the populations of the sep- 
arate containers are isolated from each other, each popula- 
tion evolves individually. In accordance with the original 
quasi-species setting, there is one master sequence with a 
higher reproduction rate. However, there is also another 
sequence — ^being the one that is furthest away from the mas- 
ter sequence in terms of Hamming distance — that promotes 
the growth of the whole container. Fig. [2 At a certain size, 
the container spontaneously divides. This constitutes a repli- 
cation of the proto-organism. Since the proto-containers are 
subject to selection, they in turn — like the individual se- 
quence populations — compete for maintenance. 

The slow repUcators that promotes the growth of the con- 
tainer is not favoured in the populations due to local domina- 
tion of the master sequence. On the scale of the containers, 
though, the slow replicator is presumed to have an advantage 
since it enhances the fitness of its host container. 

We assume that the growth of an aggregate is directly de- 
termined by its internal concentration through some func- 
tion (|)(x). Let \|/(f ,x) denote the relative concentration of ag- 
gregates that contain a population of information molecules 
with concentration vector x at time t. We use a standard ar- 
gument, see ( |Boorman and Levitt, 1 980 ) for details, for the 
flux in the non-normalised concentration density ,x) in a 




Figure 1 : Sequence space for v = 4 with a fitness peak at the 
master sequence 0000 and a aggregate growth peak at the 
slow repHcator 1111. 



volume V in x-space 

dt / dx\\i{t,x) = in/out-flux + produ 
Jv 

= - J d5-iij/(f,Jt:) + J djt:(|)(jc)\j/(f 



luction 



dji:(-V- [x\\l{t ,x)] + (^{x)\\t{t ,x)) 



where S is the surface enclosing the volume V, and dS is a 
vector valued surface element pointing in the direction nor- 
mal to the surface. Since the volume is arbitrary, the conti- 
nuity (Euler-) equation for \\i{t,x) reads 

8,l|/(f,jc) = -V • [i\(/(f,x)] +(|)(jt:)\|/(f,jc) - 

((j),\|/)\|/(f,x), (3) 

where, just as in the regular quasi-species equation, the 
scalar product ((|),x|/)(f) = J dxi^{x)\\t{t,x) is used to nor- 
malise the relative concentrations so that J (ix\\t(t,x) = 1 for 
all t. We use the explicit form of i given in Q to rewrite (|3jl 
as 

3,\|/(f,x) = -ji:^M^V\|/(f,jc)+fl^XV\|/(f,x) 

+^(x)\\t{t,x)~ {(S},\\t)\\i{t,x), (4) 

the matrix X is defied as Xij = XiXj, and the effective growth 
is defined as 



(x) + (A^+l)fl^x-Tr(M), 



where is the number of different information molecules, 
i.e., X; ! = 1 , . . . ,A^, and Tr(M) = J^i^a- We note that ^ is 
only nonlinear in the re-normalisation term. A transforma- 
tion 



l|/(f,x) 



exp 



d.9((j),\|/)(s) 



\\i{t,x) 



gives a linear equation in the new, non-normalised, "distri- 
bution" 
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Figure 2: Trajectories ^(f) in the quasi-species dynamics 
(ji = 0.2 and A = 2). All trajectories converge to ^ = 
(here, = 0.6); as consequence, all populations with t,{t) < 
^* for f > 1 must have ^(0) < 1. 



The normalised distribution is given by 



\|/(f,x) 



A two-state approximation 

The full version of the coupled population dynamics pre- 
sented in the previous section is hard to analyse analytically. 
In this section we present a simplified set of equations that 
readily allows analytic treatment. The main idea is to use 
the same "no back mutations" approximation of the quasi- 
species dynamics as was used to derive Q. 

We make a two state approximation, assuming that the 
main dynamics in the model is captured only by the relative 
concentration of fast replicators ^; all background sequences 
1 — ^ are assumed to be beneficial for the aggregate growth. 
Using (|2}, we can write (jSjl as 



+Y(l-^)^(f,^), 



(6) 



where ^* = {AQ — 1)/(A — l)is the equilibrium of the mas- 
ter sequence population dynamics and the parameter y is de- 
fined as 

Y = R/{A-l). 
Eq.|^can be solved analytically: 



(7) 



8;\|/(f , x) = -x^ Vv|/(f , x) + X V\|/(f , x) . 



(5) 



where F is a function determined by the initial distribution, 
and the parameters are defined as a = Y^;^' — 1 and P = 1 + 
Y(^^' ~ !)■ We now solve for f as a function of the initial 



distribution \|/o(^) =\|/(0,^) =\|/(0,^) 



Defining r] = - ^)/^ gives 

F(ii)^^P-«(l+iirP|il|Pvo('T^ 



1 +r| 



(8) 



Substituting (|8} back into and, where it is convenient, 
using the relation a — p = y — 2, gives the final solution 



(9) 



when < ^ < — (1 — e^^*'] and, as a consequence 
of \|/o(C) — when C > 1' zero otherwise. 

For large t, (|9} is significantly simplified. We need to con- 
sider two regions separately. Close to the singularity ^ = ^* 
the solution behaves different than elsewhere. 



exp(p^*f )i|/o (^*) 



when 1^ — ^*1 < exp(— ^*r), whereas 



exp(a^*f) 



y-2 



¥o(0) 



otherwise. The weight of the population located around 
t, = i.e., the fast replicators, grows like We ^ 
exp(P^*f)exp(— ^tf), while the rest of the population has 
weight Wa ^ exp(a^*f ). The conclusion is that if y < 1, then 
the entire population will be concentrated in an infinitesimal 
surrounding of ^ = i.e., the fast replicators dominate the 
population. 

If Y > 1 , the distribution is given by 



(10) 



where Z is just a normalisation factor. We note that the 
weight, i.e., the integral of the distribution, close to ^ = 
converges when y > 1. Eq.[lOlalso changes behaviour when 
y — 2 = 0. Then the population changes from being domi- 
nated by fast replicators to a situation when the slow replica- 
tors dominate. The weight of the distribution shifts towards 
^ = 0. 

Our analysis of the asymptotic behaviour assumes that 
the initial distribution is smooth. Most importantly we as- 
sume that \|/o(^) is regular at ^ = 0. It is clear already 
from (|6j that ^ = is a fixed point of the dynamics. If, 
for example, the initial distribution has a delta function at 
^ = 0, i.e. \\io{^) — bd{^) where X is a smooth func- 

tion, then the weight of the distribution around ^ = grows 
as exp(yf) relative to the weight of the rest of the interval. 



Clearly then, the asymptotic distribution is ^itstati^) — 25(^), 
i.e. there are only slow replicators in the population. The 
conclusion is that the stationary distribution is completely 
dominated by slow replicators if there initially exists con- 
tainers with no fast replicators. If the population within an 
aggregate is finite (see the discussion in the next section), 
this situation is not be unlikely. If fact, this observation is 
used in the stochastic corrector model of group selection 
( jSzathmary and Demeter, 1987^ . We conclude the analysis 
by reviewing the three different cases, remembering the def- 
inition y = 7?/ (A — 1): 

I. R< 2(A - 1), V(/o(^) regular at ^ = 0: The fast repHcators 
dominate the total population. 

II. R > 2{A - 1), \(/o(^) regular at ^ = 0: The slow repUca- 
tors dominate the total population but some fast replica- 
tors still exist. 

III. Jq dt,\\fo{t,) > 5 > when e ^ 0+: The slow rephcators 
completely dominate the total population. 

Note that the condition /? ^ 2(A — 1) is independent of the 
copying fidelity Q. This can be understood from (|2j where 
we see that A — 1 measures the rate of convergence towards 
the stationary distribution . 

Finite size effects 

The results in the previous section were derived in the infi- 
nite population size limit. In this section, we introduce and 
analyse a variant of the model with a finite number n of con- 
tainers. Each container / in the population is characterised 
by the fraction ^/ of information molecules in the container 
that replicate efficiently. As before, it is assumed that within 
each container, the fraction jc, evolves according to the quasi- 
species dynamics in (|2j- Container / divides according to 
an in-homogenous Poisson process with (instantaneous) rate 
y [1 — ^,(f )], corresponding to the second term in (|6}. We as- 
sume that the container divides in two equal halves, with 
identical composition of information molecules. 

In our simulations, we take the initial population, ^, (0), to 
be independent uniformly distributed random numbers. The 
time steps in the simulations consist of two parts. First, the 
value t,i of each container / is updated as 

^t + bt) = Ut) + ^tUt)[^*-Ut)]- (11) 

Second, each container is tested for division. With proba- 
bility 1 — exp(— y5f(l — ^i)) we copy container / to a ran- 
domly chosen container in the population. This allows for 
the correct rate of division for each container, while the 
number for containers in the population is kept constant. 
When y5f ^ 1, the probability of division is approximately 
y5f(l Throughout this section, we use 5f = 0.01 and 
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Figure 3: Development of the expected value of ^. Simula- 
tion results, averaged over 100 runs, are shown as lines dec- 
orated with symbols for y = 0.5 (circles), Y= 1.2 (boxes), 
Y = 2 (diamonds), and y = 5 (triangles). Also shown is ( I13t 
for each value of y (thick lines). 
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Figure 4: Expected value of ^ from simulations and from 
\\5\ . averaged over ^min- Simulation results, averaged over 
100 runs, are shown as lines decorated with symbols for y = 
0.5 (circles), y = 1 .2 (boxes), y = 2 (diamonds), and y = 5 
(triangles). 



The model behaves smoothly in /j as /j — ^ 0, so there is 
very little difference between the dynamics for, e.g., /j = 
10^^ and /J = 0. Hence, in order to simplify the analysis 
we take /j = in all simulations. The theoretical predictions 
are then 



(y-i)(i-e-') 

l_e-(T-i)' 



(12) 



for the distribution of ^ and 



y+(l -y)e ' 



l-y 



l-e-(T-i)' 



(13) 



for the expected value of ^. 

In Fig.|3] we show the expected value of ^ as a function of 
time for parameter values corresponding to the three regions, 
0<y<l, l<y<2 and y > 2, and the boarder case y = 2. 
Also shown is the theoretical prediction (I13> . For ? < 8, 
the simulation results and theory agree, but for larger t all 
simulation curves rise to (^) = 1. 

We identify two sources of the differences between the 
simulated model and the infinite population size model in 
the previous section. First, in the simulations, the production 
of new containers by division is a stochastic process - as 
opposed to the deterministic growth term in (|6j. Hence, the 
deterministic time evolution of \|/(f , ^) in (|6j acquires a noise 
term, which in turn causes V|/(f to be smeared out. 

Second, in addition to the effects of stochastic division of 
containers, we have the consequences of small differences 
in the initial distribution from one run to the next. When t 
is small, it is still a good approximation to say that V(/o(^) 
is a smooth function. However, when t is large the distri- 
bution V(/(f,^) depends on the initial distribution only in a 
small interval close to ^ = (see Fig.|3. Hence, in a finite 



population we can no longer assume that \|/o(^) is smooth 
when considered at such a small scale: at long times the dy- 
namics will then be dominated by the containers with the 
smallest ^(0). Note how this directly relates to the third case 
discussed at the end of the last section. 

The question is now: which of these two effects is the 
dominant cause for the deviations observed from the theo- 
retical predictions? In order to answer this question, sup- 
pose the initial distribution is uniform on the interval [e, 1]. 
According to (|9} the distribution is then 



(l-e + eeO'"^-e-(Y-i)' 

on the interval [^min(Oj 1] ™d ^^ro outside, where 

^mm(0 = e/[e+(l-e)e-']. 
The corresponding expected value is 

1 y+(l -y)e-' 



(14) 



y 1— e^' 
1 (l-y)(l-e)e-^ 
ye+(l-e)e-'-[e + (l-e)e-']T' 



(15) 



In Fig.l^we show the expected value of ^, for the same pa- 
rameters as in Fig.|3] with ( I15> averaged over the distribution 
n(l — e)"^^ corresponding to the minimum of n randomly 
chosen uniformly distributed numbers. This model captures 
the transition from (^) to 1, but cannot explain the simu- 
lation results completely: we attribute the remaining differ- 
ences to the effect of stochastic fluctuations of the dynamics. 
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Figure 5: Distribution of ^ for j—l.l and y = 5 at different 
times t. Average histograms from 100 simulations is shown 
as a stair-case plot. Also shown is V|/(f from (I12> (dashed 
and dotted line) and \(/e(f,^) averaged over the distribution 
of e (solid line). 



Comparing the distribution of t,, estimated from the simula- 
tions, to the theoretical prediction il2\ and the second model 
(I14> (Fig.|5} supports this conclusion. 

Finally, we consider under which circumstances one can 
expect (I12> to predict the distribution of ^. For e < 10^^, 
(^e) « (^) when t < log(0.1/e). For larger f, (^g) ap- 
proaches 1. In the limit 8^0, the population converges 
to the stationary solution at f « 4. Hence, in order for the 
population to have time to approach the stationary solution 
before the finite size effects take over, e^^ i.e. the number 
of replicators in a typical container, must be at least 500. 

Conclusions 

The deterministic model is a good description of the dynam- 
ics of the system during a transient. Most importantly, the 
condition R ^ 2(A — 1) decides whether the distribution is 
dominated by fast or slow replicators. The long-term dy- 



namics, on the other hand, is determined mostly by the dis- 
tribution of the smallest values of t, in the inital population. 
The stochastic division process influence the dynamics, but 
has very little qualitative effect (at least for large popula- 
tion sizes). These are only preliminary results; the effects 
of finite containers, stochastic growth processes and other 
complications to the model need to be investigated further 
before a conclusive answer can be obtained. 
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